Meta-Analysis of Untargetted Metabolomics from Heart Failure Patients

Background

Heart Failure (HF) is a leading cause of death, impacting approximately 64 million people globally. While overall incidence of HF is relatively stable across countries, the overall number of HF patients is increasing due to aging populations. Many articles have been published on the role of the microbiome in recovery after HF, however, this research from humans has not been systematically combined for analysis. The aim of this meta-analysis is to bridge this gap by analyzing previously published research on human heart failure patients with untargeted metabolomics to understand whether microbially-mediated metabolites are important for heart failure development or recovery.

Meta-Analysis Methods and this Repo

This RMarkdown script serves as an analysis of the metabolomics data downloaded from previously published studies. The data to be included in this analysis had to meet the following criteria:

  1. Be a study of adult humans with heart failure
    • studies of children or of patients at risk for heart failure were disqualified
  2. Conduct untargetted metabolomics
    • any sample type (plasma, serum, stool, etc.) is included
    • any metabolomics method is included so long as it is untargeted
    • any targeted metabolomics method is excluded (e.g. stable-isotope labelled metabolites)
  3. Have publicly available data
    • the data must be published to a public repository, like MetaboLights.
    • specifically, we selected studies that had processed data published as that data (theoretically) has been compared to internal standards for peak-normalization / identification.
    • Data had to have a metadata file that labelled which samples correspond to heart failure versus control samples. Metadata files inconsistently included other information, like age or sex of patients, so that information is not included in this analysis.

Screening process

Let’s review how many papers qualified out of our total number of papers screened for this meta-analysis and systematic review. This part of the code is included so that others can see how we screened and IDed papers from our search terms. However, any conflicts between reviewer screening results were fixed in an external document, not this code, so that is difficult to represent here.

We also want to understand the contribution of each lab member to this massive effort. Let’s look at how many papers each lab member reviewed.

Let’s dig deeper into the papers that didn’t make it (since most of them did not qualify for inclusion):

dat_plot <- places1 %>% ungroup() %>% select(heart_failure_yn, human_yn,metabolomics_yn, data_availability )
## plot

mydat <- dat_plot %>% 
  make_long(heart_failure_yn, human_yn,metabolomics_yn, data_availability )
reagg <- mydat %>%
  dplyr::group_by(node)%>%  # Here we are grouping the data by node and then we are taking the frequency of it 
  tally()
mydat2 <- merge(mydat,
             reagg, 
             by.x = 'node', 
             by.y = 'node', 
             all.x = TRUE)
mydat2 %>%
  ggplot(aes(x = x,                        
             next_x = next_x,                                     
             node = node,
             next_node = next_node,        
             fill = factor(node),
             label = paste0(node, " = ", n))) +           # This Creates a label for each node) +
  geom_sankey(flow.alpha = 0.5,          #This Creates the transparency of your node 
                      node.color = "black",     # This is your node color        
                      show.legend = FALSE) +
  geom_sankey_label(size = 4, fill = "white") + # specifies the Label node
  theme_bw() +
  theme(axis.title = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
                 axis.ticks = element_blank(),
                 panel.grid = element_blank() ) +
  theme(legend.position = 'none') +

  scale_fill_viridis_d(option = "inferno")+
  labs(title = "Systematic Review: Screening Results")
#ggsave("supp_fig1_screening_results.png", dpi = 300, height = 7, width = 7.8)
print("Not shown are the articles not in English.")
print("6 articles here claimed to have data available, but 3 or those 6 had data in proper format for proper use.")

Lets look at papers that nearly qualified. This is ugly so I am not including this code chunk in the knit document, but you can run this interactively to invegstigate the details of papers that were not included at each step

lilb <- dat_secondary1 %>%
    filter(reviewer != "EW" & reviewer != "Ew" & reviewer != "ew" & reviewer != "EW " & reviewer != " EW" & reviewer != "English") 
table(lilb$heart_failure_yn)
lilb2 <- lilb %>% filter(heart_failure_yn == "Yes, study of heart failure.")
table(lilb2$human_yn)
lilb3 <- lilb2 %>% filter(human_yn== "yes, human adult study (or combination human + animal)")
table(lilb3$metabolomics_yn)
lilb4 <- lilb3 %>% filter(metabolomics_yn != "No, does not measure metabolomics or does not apply")
table(lilb4$data_availability)
#table(lilb4$data_avail_statement)

A systematic survey of the literature identified approximately 700 articles discussing HF, the microbiome, and metabolomics. Of these, 82 were primary studies of heart failure patients, 61 involved human adults, 23 included an untargeted metabolomics measure, and 3 studies had data that was usable and publicly accessible.

Looking at Metabolomics Data

A total of three studies qualified and had available data. Here is a description of each dataset and it’s simplified name for our purposes:

## [1] "The following metabolites were detected in both left ventricle hearts and stool of HF."
##  [1] "Thymidine"                  "Niacinamide"               
##  [3] "Glycine"                    "Leucine"                   
##  [5] "Uracil"                     "Serine"                    
##  [7] "Purine"                     "Phenylpyruvate"            
##  [9] "Taurine"                    "Ornithine"                 
## [11] "Cytidine"                   "Hypoxanthine"              
## [13] "Citrulline"                 "Serotonin"                 
## [15] "Xanthine"                   "5-hydroxyindoleacetic acid"
## [17] "Spermine"                   "Glucose-6-phosphate"       
## [19] "Uridine"                    "Inosine"                   
## [21] "Adenosine"                  "Xanthosine"
## [1] "The following metabolites were detected in breath and stool of HF PTs. "
## character(0)
## [1] "The following metabolites were detected in saliva and stool oh HF patients. "
##  [1] "Methylamine"  "Phenol"       "Urea"         "Ethanolamine" "Glycine"     
##  [6] "Putrescine"   "Leucine"      "Uracil"       "Creatinine"   "Taurine"     
## [11] "Ornithine"    "Hypoxanthine" "Galactose"    "Sucrose"
## [1] "The following metabolites were detected in saliva and LV. "
##  [1] "Glucose"       "Hypoxanthine"  "Lactate"       "Pyruvate"     
##  [5] "Succinate"     "Taurine"       "Uracil"        "Glycine"      
##  [9] "Alanine"       "Proline"       "Valine"        "Isoleucine"   
## [13] "Leucine"       "Histidine"     "Phenylalanine" "Tyrosine"     
## [17] "Choline"       "Creatine"      "Ornithine"
## [1] "The following metabolites were detected in breath and LV of HF PTs. "
## [1] "Lactate"  "Pyruvate"

This is supplemental figure 1, but one of the first plots I created to investigate the data. We weren’t sure there would be high overlap of metabolites between studies because samples are from different locations and use different metabolomics methods. Despite this, we still see a smaller fraction of metabolites can be detected in multiple sample types / by different metabolomics methods. We also get to see how different sample types and methods can ID a different number of metabolites.

Now we want to investigate the metabolites that are differentially expressed for heart failure patients, specifically. Some data cleaning is included in this next code chunk as well.

Let’s go ahead and plot the metabolites that overlap between studies so we can see their distribution.

It turns out a heat map is not very informative here. Oh well, it happens! I’m keeping this here so that others can see what we tried and what we decided to put in the publication/preprint.

Let’s compare these metabolites to control to see what is up vs down regulated. Data from the repos are already normalized (we checked!), so we won’t be adding another data transformation step here. We should also compare within-study between group beta dispersions to understand if the within group diversity is similar across groups / organs (also an assumption for PCA).

Since I need to do a PCA four times, lets write a function for it. Heads up to others - this bit of code takes a bit of time to run locally. You computer may sound like it is trying to take off to space.

## this function name is a bit misleading - "raw_abun" refers to raw value from the repo, not a raw value from metabolomics machines. 
## however I'm not changing it because 1) im afraid of breaking something and 2) this is a relic of when we were trying to figure out what normalization methods had been applied to the data in the repos 
## I was able to replicate the PCAs in the pubs with this method

conduct_pca_raw_abun <- function(studyID_filt, main_dat){
  d1wide <- main_dat %>%
    ungroup() %>%
    select(Samples, abun, sample, studyID)%>%
    filter(studyID == studyID_filt)%>%
#    mutate(norm_abun_log10 = as.numeric(.data$norm_abun_log10)) %>%
    pivot_wider(names_from = "Samples", values_from = abun, values_fill = 0 )
 # return(d1wide)
  
  d1_pca <- main_dat %>% 
    ungroup() %>%
    select(Samples, abun, sample, studyID)%>%
    filter(studyID == studyID_filt)%>%
    pivot_wider(names_from = "Samples", values_from = abun, values_fill = 0 ) %>%
    select(where(is.numeric)) %>% # retain only numeric columns
    prcomp(scale = TRUE) # do PCA on scaled data

  d1wide <- merge(d1wide, minimeta, by = "sample", all.x = T) %>% distinct()
  d1wide <- d1wide %>% mutate(study_group = paste0(studyID.x, "_",heart_cond))
  plot_name <- paste0(studyID_filt, " PCA Plot")
  pca_plot <- d1_pca %>%
    augment(d1wide) %>% # add original dataset back in
    ggplot(aes(.fittedPC1, .fittedPC2, color = heart_cond)) + 
    geom_point(size = 3, alpha = 0.7) +
    scale_color_manual(values = c(HF = "goldenrod3", CTRL = "grey45")  ) +
    theme_hc(12) + background_grid() +
    labs(title = plot_name, x = "PC1", y = "PC2" ) +
    stat_ellipse() +
    theme(legend.title = element_blank())

  arrow_style <- arrow(    angle = 20, ends = "first", type = "closed", length = grid::unit(8, "pt"))
# plot rotation matrix
  arrow_plot <- d1_pca %>%
    tidy(matrix = "rotation") %>%
    pivot_wider(names_from = "PC", names_prefix = "PC", values_from = "value") %>%
    ggplot(aes(PC1, PC2)) +
    geom_segment(xend = 0, yend = 0, arrow = arrow_style) +
    geom_text(
      aes(label = column),
      hjust = 1, nudge_x = -0.02, 
      color = "#904C2F"
    ) +
    #xlim(-1.25, .5) + ylim(-.5, 1) +
    coord_fixed() + # fix aspect ratio to 1:1
   theme_minimal_grid(12) + labs(title = plot_name)

  percent_plot <- d1_pca %>%
    tidy(matrix = "eigenvalues") %>%
    ggplot(aes(as.integer(PC), percent)) +
    geom_col(fill = "#56B4E9", alpha = 0.8) +
    scale_y_continuous(
      labels = scales::percent_format(),
      expand = expansion(mult = c(0, 0.01))
    ) +
    theme_hc(12) +
    labs(title = "% Var.", y = "Percent Variance Explained", x = "PC") +
    #xlim(0,10) # ylim(0,10) +
    scale_x_continuous(limits = c(0, 11), breaks = c(0, 2, 4, 6, 8, 10))
  
  combo_plot <- plot_grid(pca_plot, percent_plot, rel_widths = c(2, 1))
  #return(pca_plot)
  #return(arrow_plot)
  #return(percent_plot)
  return(pca_plot)
}

p3 <- conduct_pca_raw_abun("D3", main_dat=trans_metab)
## Warning: Removed 34 rows containing missing values or values outside the scale range
## (`geom_col()`).
ggsave("figs/pca_wo_pv_d3.png", height = 7, width = 10)
p1 <- conduct_pca_raw_abun("D1", main_dat=trans_metab)
## Warning: Removed 85 rows containing missing values or values outside the scale range
## (`geom_col()`).
ggsave("figs/pca_wo_pv_d1.png", height = 7, width = 10)
#ggsave("d1_raw_abun.png")
p2_ebc <- conduct_pca_raw_abun("D2_EBC", main_dat=trans_metab)
ggsave("figs/pca_wo_pv_d2_ebc.png", height = 7, width = 10)
p2_sal <- conduct_pca_raw_abun("D2_SALIVA", main_dat=trans_metab)
## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_col()`).
ggsave("figs/pca_wo_pv_d2_saliva.png", height = 7, width = 10)

pca_all <- plot_grid(p1,p3, p2_ebc, p2_sal, ncol = 2, labels = "AUTO", byrow = TRUE)
pca_all

ggsave("figs/pca_wo_pv_all.png", dpi = 400, height = 10, width = 10)

Next we want to detect significance from PCA to see what is qualified for the opls-da analysis. OPLS-DA will overfit data, but it will not always be obvious when this is the case. By only running OPLS-DA on data that is significant from PCA, we reduce the risk of overfitting. In the preprint there are some good citations that discuss this in depth. The following code chunk is not knit but the results are saved as part of the .Rdata file included in the repo on github. Feel free to check out those numbers there. (I got compiling issues where it kept getting stuck running the d3 pca when compiling the document, but not in interactive more. I don’t know how to fix that so this is the solution I came up with.)

## assumes columns are variables/features and rows are sampleIDs / obs
pca_sig_test <- function(studyID_filt, main_dat){
  
  d1_pca_cat <- main_dat %>% 
    ungroup() %>%
    dplyr::select(Samples, abun, sample, studyID, heart_cond)%>%
    filter(studyID == studyID_filt & abun > 0) %>%
    select(-studyID) %>% 
    pivot_wider(names_from = "Samples", values_from = abun , values_fill = 0)  #%>%select( -Isovalerate ,-Ornithine) #-Isoleucine,   #-Isopropanol) ## D2 studies had too few observations for these variables because the overall sample size was so small. the 20% threshold was basically not strong enough for that so needed to remove these as well, unfortunately. went ahead and moved that to a special function for d2 below 
  
  d1_pca <- d1_pca_cat %>% dplyr::select(where(is.numeric))
  ## our data is already scales, so not using the below scaling function. keeping here for other's reference though.
  #d1_pca_scaled <- scale(d1_pca)
  results <- PCAtest(d1_pca, nboot = 100, nperm = 100, plot = FALSE, counter = FALSE, varcorr = TRUE)
  
  # adonis/adonis2 not appropriate because it assumed we are working with distance matrices, which we are not?? unless we input the pca itself?
  # <- adonis2(d1_pcaa ~ heart_cond, data = d1_pca_cat, method='eu')
  ## pairwi9se.adonis2() inappropriate here bc assumes working with bray-curtis/microbe data, which we are not. 
  
  ## test pca in UV scaling, produces same results so hashing out (no point of doing it). leaving for future reviewers. 
  return(results)
  # inData <- main_dat %>%
  #   ungroup() %>% 
  #   filter(studyID == studyID_filt) %>%
  #  dplyr::select(Samples, abun, sample, studyID) %>%
  #uv_mat <- santaR:::scaling_UV(inData)
  #result2 <- PCAtest(uv_mat, nboot=100, nperm=100, plot = TRUE)
  ## produces identical results, leaving here only if a reviewer wants to test or see with a scaling method
}

d3_pca_test <- pca_sig_test("D3", trans_metab)
d1_pca_test <- pca_sig_test("D1", trans_metab)
## these are hashed out because it takes awhile to run when not creating the document for github. i recomend reviewers do the same because it takes too long otherwise

## assumes columns are variables/features and rows are sampleIDs / obs
pca_sig_test_d2 <- function(studyID_filt, main_dat){
  
  d1_pca_cat <- main_dat %>% 
    ungroup() %>%
    dplyr::select(Samples, abun, sample, studyID, heart_cond)%>%
    filter(studyID == studyID_filt & abun > 0) %>%
    ## special filtering for D2: must be in at least 5 samples, bc otherwise N is too small and the math doesn't like it. 
    group_by(Samples) %>%
    count() %>%
    filter(n >= 5) %>%
    ungroup() %>%
    select(-studyID) %>% 
    pivot_wider(names_from = "Samples", values_from = abun , values_fill = 0) ## dont need evil hardcoded filtering because implemented the at least N observations above
    #%>%select( -Isovalerate ,-Ornithine, -Isoleucine,  -Isopropanol) ## D2 studies had too few observations for these variables because the overall sample size was so small. the 20% threshold was basically not strong enough for that so needed to remove these as well, unfortunately 
  
  d1_pca <- d1_pca_cat %>% dplyr::select(where(is.numeric))
  ## our data is already scales, so not using the below scaling function. keeping here for other's reference though.
  results <- PCAtest(d1_pca, nboot = 100, nperm = 100, plot = FALSE, counter = FALSE, varcorr = TRUE)
  return(results)
  # inData <- main_dat %>%
  #   ungroup() %>% 
  #   filter(studyID == studyID_filt) %>%
  #  dplyr::select(Samples, abun, sample, studyID) %>%
  #uv_mat <- santaR:::scaling_UV(inData)
  #result2 <- PCAtest(uv_mat, nboot=100, nperm=100, plot = TRUE)
  ## produces identical results, leaving here only if a reviewer wants to test or see with a scaling method
}
d2_saliva_pca_test <- pca_sig_test_d2("D2_SALIVA", trans_metab)
d2_ebc_pca_test <- pca_sig_test_d2("D2_EBC", trans_metab)#

trans_metab %>% ungroup() %>%
    dplyr::select(Samples, abun, sample, studyID, heart_cond) %>%
    filter(studyID == "D2_SALIVA" & abun > 0) 

PCA revealed significant differences between group structure for D1 and D3 (observed psi and phi greater than null, p value < 0.01). While D2 EBC samples had a single significant principal component, this is likely due to confounding, such as from multicollinearity among metabolic pathways, rather than meaningful group differences, as significance was only in one dimension (PC1). Further, D2 saliva samples had uneven beta dispersion across groups (within-group variance), violating the PCA assumptions of multivariate homogeneity of group dispersions, so no conclusions of significance can be drawn from D2 PCA results. This is Table 1 of the publication/preprint.

Next, we want to investigate each metabolite that is significant.

univariate_testing <- function(studyID_filt, main_dat){
    y = data.frame("pval", "metabolite", "studyID", "adj_pval_fdr")
    colnames(y) <- c("pval", "metabolite", "studyID", "adj_pval_fdr")
    smol_dat <- main_dat %>%
        ungroup() %>%
        filter(studyID == studyID_filt) 
    metab_list <- as.list(unique(smol_dat$Samples))
    
    for (i in metab_list) {
      new_line = data.frame("pval", "metabolite", "studyID", "adj_pval_fdr")
      univar_testing <- smol_dat %>%
        filter(Samples == i ) %>%
        distinct() %>%
        mutate(heart_cond = as.factor(heart_cond)) #%>%filter(length( forcats::fct_unique(heart_cond)) == 2)
        ## something to confirm two levels of factor
      if (length( forcats::fct_unique(univar_testing$heart_cond)) == 2) {
        x_abun <- univar_testing$abun
        y_heart_cond <- univar_testing$heart_cond
        w <- wilcox.test(x_abun ~ y_heart_cond, alternative = "two.sided", exact = TRUE)
        new_line$pval <- as.numeric(w$p.value)
        new_line$metabolite <- i
        new_line$studyID <- studyID_filt
        new_line <- new_line %>% select(pval, metabolite, studyID) %>%
          mutate(adj_pval_fdr = p.adjust(pval, method = "fdr", n = length(metab_list)))
        y <- rbind(y, new_line)
      }
    }
  ## after math add info to file  
  fname = paste0("dat_file/univariate_testing_", studyID_filt, ".txt")
  write.table(y, file = fname, append = FALSE, quote = FALSE, row.names = FALSE, sep = "\t")
}
#wilcox_test(formula = abun ~ heart_cond, alternative = "two.sided", data = trans_metab)

univariate_testing("D2_SALIVA", trans_metab)
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
univariate_testing("D2_EBC", trans_metab)
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
univariate_testing("D3", trans_metab)
univariate_testing("D1", trans_metab)
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties

This univariate testing function will produce a file that gives raw and adjusted p-values for each metabolite in each study. There is some discussion about what appropriate adjusted p-value cut off levels are. Some say 0.05 is a fair cut off for adjusted p values with a targetted hypothesis (like only being interested in microbially mediated metabolites), while others day that 0.005 is a necessary cut off for adjusted p values, though this is more in the context of biomarker identification. We’ve seen all of the above strategies used in the literature, so we hope that other researchers can use this data/file to draw their own conclusions about the significance of metabolites not discussed in our publication based on what p-values they think are appropriate. This is table 2 of the publication/preprint.

Now that we have generated univariate statistics, lets do the classic volcano plot

make_volc_plots <- function(studyID_filt, main_dat){
  fname = paste0("dat_file/univariate_testing_", studyID_filt, ".txt")
  d <- read.csv(file =fname, sep = "\t", skip = 1, row.names = NULL)

  d_sig <- d %>% filter(adj_pval_fdr < 0.05)
  sig_metas <- main_dat %>%
    filter(studyID ==  studyID_filt) %>%
    filter(Samples %in% d_sig$metabolite) %>%
    ggplot(aes(x = heart_cond, y = abun, fill = heart_cond)) +
    geom_boxplot(outlier.shape = NA, alpha = 0.5) +
    geom_jitter(height = 0, width = 0.2, alpha = 0.7) +
    theme_bw() +
    scale_fill_manual(values = cbbPalette) +
    facet_wrap(.~Samples, scales = "free") +
    labs(title = paste0("Significant Metabolites, ", studyID_filt)) +
    theme_set(theme_bw(base_size=20))

  plotname = paste0("significant_metabolites_", studyID_filt, ".png")
  ggsave(plotname, height = 15, width = 20, plot = sig_metas)
  
  #volcano plot?? 
  volc_plot <- main_dat %>%
    filter(studyID == studyID_filt) %>%
    #ungroup() %>%
    group_by(heart_cond, Samples) %>% 
    summarise(value = median(abun) ) %>%
    pivot_wider(names_from = heart_cond, values_from = value, values_fill = 0.000001) %>%
    mutate(fold_change = log2(HF) - log2(CTRL)) %>%
    mutate(label_info = ifelse(fold_change %in% top_n(x = ., n = 10, wt = fold_change), "top10", " ")) %>%
    merge(d, by.x = "Samples", by.y = "metabolite") %>%
    mutate(signi = ifelse(Samples %in% d_sig$metabolite, "significant", "not significant"))%>%
    ggplot(aes(x = fold_change, y = -log10(adj_pval_fdr), color = signi)) +
    scale_color_manual(values = cbbPalette) +
    labs(title = paste0(studyID_filt," Volcano Plot"), color = " ") +
    theme_pubclean()  +
    geom_vline(xintercept=c(0), linetype = "dashed") +
    geom_hline(yintercept=-log10(0.05), linetype = "dashed") +      
    #geom_text(aes(label=ifelse(label_info=="top10",as.character(Samples),'')),hjust=0,vjust=0)
    geom_label_repel(aes(label = Samples),
                  box.padding   = 0.35, 
                  point.padding = 0.5,
                  segment.color = 'grey50',
                  max.overlaps = 15) +
    theme_set(theme_pubclean(base_size=20))+
    geom_point()# +
    #scale_x_continuous(expand = expansion(add = 0.5)) +
    #scale_y_continuous(expand = expansion(add = c(0,3))) 

  print(volc_plot)
  plot2name = paste0("volcanoplot_", studyID_filt, ".png")
  ggsave(plot2name, plot = volc_plot)
  #print(d_sig)
  print(volc_plot)
}

v1 <- make_volc_plots("D1", trans_metab)
## `summarise()` has grouped output by 'heart_cond'. You can override using the
## `.groups` argument.
## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Saving 7 x 5 in image
## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

v1
## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

ggsave("figs/volcano_plt_d1.png", dpi = 300)
## Saving 7 x 5 in image
## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
v3 <- make_volc_plots("D3", trans_metab)
## `summarise()` has grouped output by 'heart_cond'. You can override using the
## `.groups` argument.
## Warning: ggrepel: 113 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Saving 7 x 5 in image
## Warning: ggrepel: 112 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

## Warning: ggrepel: 113 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

v3
## Warning: ggrepel: 113 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

ggsave("figs/volcano_plt_d3.png", dpi = 300)
## Saving 7 x 5 in image
## Warning: ggrepel: 112 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
plot_grid(v1,v3, ncol = 2, labels = "AUTO")
## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 116 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

ggsave("figs/volcano_plt_d13.png", dpi = 300)
## Saving 7 x 5 in image
## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 116 unlabeled data points (too many overlaps). Consider increasing max.overlaps
#make_volc_plots("D2_EBC", trans_metab)
# make_volc_plots("D2_SALIVA", trans_metab)
## d2 studies have no sig after adjustment so not running

Ortholog partial least squares discriminant analysis (OPLS-DA) was conducted only for studies with significant group differences in PCA to examine what features contribute the most to group differences (D1 and D3) with R package ropls(). OPLS-DA is inclined to overfit models to data, and the risk of overfitting results is reduced by only conducting OPLS-DA for data with significant group differences as revealed by PCA. VIPs from the OPLS-DA are not reported as VIPs are not useful for OPLS modeling of a single response (i.e. heart failure status). Further results of specific metabolites from OPLS-DA are not reported but included this code output as univariate testing with stringent FDR correction and p-value filtering more appropriate for this analysis.

set.seed(20150615) ## if you know why i set the seed to this date, you get a special prize##
## this is repeated from the set up just to be sure that the seed is right

minimeta <- rbind(minimeta,
      minimeta %>% 
        filter(studyID == "D2") %>% 
        mutate(studyID = "D2_SALIVA"))

conduct_oplsda <- function(studyID_filt, main_dat){
  metab_wide <- main_dat %>%
    ungroup() %>%
    pivot_wider(names_from = Samples, values_from = abun, values_fill = 0)

  inMeta <- minimeta %>% filter(studyID == studyID_filt & sample %in% metab_wide$sample) %>% arrange(sample) %>% select(-studyID)
  colnames(inMeta) <- c("ind", "group")
  
  ## prep dataframe 
  inData <- 
    metab %>% ungroup() %>% filter(studyID == studyID_filt) %>%
    dplyr::select(Samples, abun, sample, studyID) %>%
         distinct() %>%
    filter(sample %in% inMeta$ind) %>%
    #  group_by(sample, Samples) %>%
    #    filter(n()>1)
        mutate(sample = paste0(sample, "_", studyID)) %>%
      pivot_wider(names_from = "Samples", values_from = "abun") %>% 
    ungroup() %>%
      dplyr::select(-studyID, -sample) #%>%  arrange(sample)
  
  opls.pca <- opls(inData)
  vect_y <- inMeta$group
  fname = paste0("figs/oplsda_model_", studyID_filt, ".svg")
  
  hf.oplsda <- opls(inData, vect_y, predI = 1, orthoI = NA, subset = "odd", plotSubC = studyID_filt)
  ggsave(fname)
  
  hf.oplsda <- opls(inData, vect_y, predI = 1, orthoI = NA, plotSubC = studyID_filt)
  
  plot(hf.oplsda,
       typeVc = "x-score",
       parAsColFcVn = vect_y,
       fig = fname, plotSubC = studyID_filt,
       parPaletteVc = c("grey30", "goldenrod2"))
}
## prep inMeta for RDS

conduct_oplsda("D3", trans_metab)
## PCA
## 44 samples x 126 variables
## standard scaling of predictors
##       R2X(cum) pre ort
## Total    0.529   5   0
## Warning: 'permI' set to 0 because train/test partition is selected

## OPLS-DA
## 23 samples x 126 variables and 1 response
## standard scaling of predictors and response(s)
##       R2X(cum) R2Y(cum) Q2(cum)  RMSEE RMSEP pre ort
## Total    0.509    0.995   0.809 0.0376 0.242   1   3
## Saving 7 x 5 in image
## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 116 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

## OPLS-DA
## 44 samples x 126 variables and 1 response
## standard scaling of predictors and response(s)
##       R2X(cum) R2Y(cum) Q2(cum) RMSEE pre ort pR2Y  pQ2
## Total    0.312    0.893   0.808 0.161   1   1 0.05 0.05

conduct_oplsda("D1", trans_metab)
## Warning: The maximum number of components for the automated mode (10) has been
## reached whereas the cumulative variance 44% is still less than 50%.
## PCA
## 95 samples x 561 variables
## standard scaling of predictors
##       R2X(cum) pre ort
## Total    0.441  10   0
## Warning: 'permI' set to 0 because train/test partition is selected

## OPLS-DA
## 48 samples x 561 variables and 1 response
## standard scaling of predictors and response(s)
##       R2X(cum) R2Y(cum) Q2(cum)  RMSEE RMSEP pre ort
## Total    0.195    0.956   0.691 0.0948 0.442   1   2
## Saving 7 x 5 in image
## Warning: ggrepel: 549 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 116 unlabeled data points (too many overlaps). Consider increasing max.overlaps

## OPLS-DA
## 95 samples x 561 variables and 1 response
## standard scaling of predictors and response(s)
##       R2X(cum) R2Y(cum) Q2(cum)  RMSEE pre ort pR2Y  pQ2
## Total    0.229    0.978   0.734 0.0665   1   4 0.05 0.05

Understanding OPLS-DA outputs:

Pathway Analysis

Now that we have data tested for significance, we want to see what pathways are significant within this data. I tested a couple different methods for this. First is the categories in the csv sig_metabolites_univariate_analysis_fdr, which came from my interpretation of PubChem categories. I decided this wasn’t systematic enough so we also used KEGG/rast categories, however, I still think these are useful as it gives some bland categories a human touch and I am including them here.

cats <- read.csv("dat_file/sig_metabolites_univariate_analysis_fdr - Sheet1.csv")
## merge categories for abundances for some plotting

unidat <- metab %>% 
  select(sample, Samples, abun, studyID, heart_cond) %>%
  merge(cats, metab, by.y = "metabolite", by.x = "Samples") %>%
  filter(!grepl("QC", sample)) %>%
  mutate(study_group = as.factor(paste0(studyID.y, "_", heart_cond)))

unidat$studyID = unidat$studyID.x
unidat <- unidat %>% select(-studyID.x, -studyID.y) %>% mutate(adj.p = ifelse(adj_pval_fdr >= 0.001, paste0("= ", adj_pval_fdr), "< 0.001"))


for (i in unique(unidat$Samples)) {
  textdat <- unidat %>%
    filter(Samples == i) %>%
    select(adj.p, study_group)
  
  a <- unidat %>%
    filter(Samples == i) %>%
    ggplot(aes(x = Samples, y = abun, fill = study_group)) +
    geom_boxplot(outlier.shape = NA, alpha = 0.5) +
    geom_point(position = position_jitterdodge(), alpha = 0.7)+
    theme_bw() +
    facet_wrap(.~pathway, scales = "free") +
    labs(title = paste0(i), x = "") +
    theme_set(theme_bw(base_size=10))+
    scale_fill_manual(values = c("D1_CTRL" = "grey30",
                                "D3_CTRL" = "grey20",
                                "D1_HF" = "goldenrod3",
                                "D3_HF" = "goldenrod1")) +
    #geom_text(data = textdat, aes(x = 1.5, y = 0, label = paste0("adj. p ", adj.p)))
    labs(subtitle = paste0("adj. p ", textdat$adj.p), size = 5) +
    theme(legend.position = "top", legend.title = element_blank())
  
  print(a)
  fname = paste0("univariate_sig_plots/",i, ".png" )
  ggsave(plot=a, filename=fname, dpi = 300, height = 4, width = 3)

}

Lucky for us i was able to work some magic (use the MetaboAnalyst website) and map the kegg pathways to the metabolite common names. website is metabanalyst dot com. I wasn’t able to find a way to do this within R or via the command line. Please let me know if you know of a better day to do this.

#library(pathview)
kegg <- read.csv("dat_file/map_common_name_to_kegg.csv.csv")
pathdat <- merge(unidat, kegg, by.x = "Samples", by.y = "Query", all.x = TRUE)
#pathdat %>%
#  filter(adj_pval_fdr < 0.05) %>%
#  select(sample,heart_cond, Samples, abun, KEGG)


graph <- buildGraphFromKEGGREST( organism = "hsa", filter.path = NULL) ## only needs to run once per session for users
## Building through KEGGREST...
## Available gene annotations: ncbi-geneid, ncbi-proteinid. Using ncbi-geneid
##   |                                                                              |                                                                      |   0%  |                                                                              |==============                                                        |  20%  |                                                                              |============================                                          |  40%  |                                                                              |==========================================                            |  60%  |                                                                              |========================================================              |  80%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |=======                                                               |  10%  |                                                                              |==============                                                        |  20%  |                                                                              |=====================                                                 |  30%  |                                                                              |============================                                          |  40%  |                                                                              |===================================                                   |  50%  |                                                                              |==========================================                            |  60%  |                                                                              |=================================================                     |  70%  |                                                                              |========================================================              |  80%  |                                                                              |===============================================================       |  90%  |                                                                              |======================================================================| 100%
## Done.
## Building graph...
## Done.
## Pruning graph...
## Current weight: 1 out of 4...
## Current weight: 2 out of 4...
## Current weight: 3 out of 4...
## Current weight: 4 out of 4...
## Done.
tmpdir <- paste0(tempdir(), "\\my_database2")
# Make sure the database does not exist from a former vignette build
# Otherwise the vignette will rise an error
# because FELLA will not overwrite an existing database
unlink(tmpdir, recursive = TRUE)

buildDataFromGraph(keggdata.graph = graph,
                    databaseDir = tmpdir, 
                    internalDir = FALSE, 
                    matrices = "none", 
                    normality = "diffusion", 
                    niter = 100)
## Computing probabilities for random subgraphs... (this may take a while)
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  63%  |                                                                              |=============================================                         |  64%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%
## Directory C:\Users\emily\AppData\Local\Temp\RtmpSe6ZN8\my_database2 does not exist. Creating it...
## Done.
## Done.
## Computing diffusion.matrix... (this may take a while and use some memory)
## Warning in asMethod(object): sparse->dense coercion: allocating vector of size
## 1.1 GiB
## Computing diffusion.rowSums...
## Done.
fella.data <- loadKEGGdata(databaseDir = tmpdir, 
                           internalDir = FALSE,
                           loadMatrix = "none")
## Loading KEGG graph data...
## Done.
## Loading hypergeom data...
## Loading matrix...
## 'hypergeom.matrix.RData' not present in:C:\Users\emily\AppData\Local\Temp\RtmpSe6ZN8\my_database2/hypergeom.matrix.RData. Hypergeometric test won't execute.
## Done.
## Loading diffusion data...
## Loading matrix...
## 'diffusion.matrix.RData' not loaded. Simulated permutations may execute slower for diffusion.
## Done.
## Loading rowSums...
## Done.
## Loading pagerank data...
## Loading matrix...
## 'pagerank.matrix.RData' not loaded. Simulated permutations may execute slower for pagerank.
## Done.
## Loading rowSums...
## 'pagerank.rowSums.RData' not present in:C:\Users\emily\AppData\Local\Temp\RtmpSe6ZN8\my_database2/pagerank.rowSums.RData. Z-scores won't be available for pagerank.
## Done.
## Data successfully loaded.
pathdat %>%
  group_by(studyID) %>%
  summarize(unique_kegg = length(unique(KEGG)))
## # A tibble: 3 × 2
##   studyID   unique_kegg
##   <chr>           <int>
## 1 D1                 38
## 2 D2_SALIVA           1
## 3 D3                 19
## not many things mapping to kegg pathways so different pathway analysis below 

great! the above only needs to run a single time so we are starting a new code block.

cur_path <- getwd()
pathway_analysis <- function(studyID_filt, main_dat, filt_cat){
  
  filt_dat <- main_dat %>% filter(studyID == studyID_filt)
  cmp.list <- filt_dat$KEGG  
  analysis.enrich <- enrich(compounds = cmp.list, 
                            data = fella.data, 
                            method = "diffusion", 
                            approx = "simulation")

  analysis.enrich %>%
    getInput() %>%
    getName(data = fella.data)

  plot(analysis.enrich, 
     method = "diffusion", 
     data = fella.data,
     nlimit = 250,
     plotLegend = FALSE)

    ## subnetwork analysis
#  g.enrich <- generateResultsGraph(object = analysis.enrich,
#                              data = fella.data,  method = "diffusion")
#  tab.enrich <- generateResultsTable(object = analysis.enrich,
#                                  data = fella.data, 
#                                  method = "diffusion")
  plot(analysis.enrich, 
     method = "diffusion", 
     data = fella.data,
     nlimit = 50,
     plotLegend = TRUE,)

  myTable <- generateResultsTable(
    object = analysis.enrich, 
    method = "diffusion", 
    #threshold = 0.05, ## p thresh? 
    data = fella.data)
  #print(knitr::kable(head(myTable, 20))) ## print top 20 table, depreciated since written to file now
  k_name = paste0("dat_file/kable_tbl_", studyID_filt)
  myTable$studyID = studyID_filt
  write.csv(myTable, file = k_name, row.names = TRUE, quote = FALSE)
  
  plot(
    x = analysis.enrich, 
    method = "diffusion", 
    main = "Enrichment using the diffusion analysis in FELLA", 
    threshold = 0.01, 
    data = fella.data 
  )
  ## write a lil something something to automatically visualize the top 10 pathways per study here
  top10 <- myTable$KEGG.id[1:10]
  setwd(paste0(cur_path, "/dat_file/"))
  for ( i in top10){
    i = as.character(i)
    fname = paste0(studyID_filt, "_", i)
    pathview(cpd.data = cmp.list, species = "hsa", pathway.id = i, out.suffix = fname, kegg.native = FALSE)
  }
}

pathway_analysis("D1", pathdat) ## take a huge amount of processing so leaving out
## No background compounds specified. Default background will be used.
## Warning in defineCompounds(compounds = compounds, compoundsBackground =
## compoundsBackground, : Some compounds were introduced as affected but they do
## not belong to the background. These compounds will be excluded from the
## analysis. Use 'getExcluded' to see them.
## Running diffusion...
## Estimating p-values by simulation.
## Diffusion matrix not loaded. Simulations will be slower...
## 10%
## 20%
## 30%
## 40%
## 50%
## 60%
## 70%
## 80%
## 90%
## 100%
## Done.
## 611 nodes below the threshold have been limited to 250 nodes.
## 611 nodes below the threshold have been limited to 50 nodes.

## Writing diffusion results...
## Done.

## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa01040.D1_hsa01040.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa01232.D1_hsa01232.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04022.D1_hsa04022.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Warning: None of the genes or compounds mapped to the pathway!
## Argument gene.idtype or cpd.idtype may be wrong.
## Info: Edge type: PPrel without subtype sepecified!
## Warning in .subtypeDisplay(object): Given subtype 'unknown' is not found!
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04151.D1_hsa04151.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04540.D1_hsa04540.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04924.D1_hsa04924.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Warning in .subtypeDisplay(object): Given subtype 'missing interaction' is not found!
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa05012.D1_hsa05012.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa05032.D1_hsa05032.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa05034.D1_hsa05034.pdf
## Info: Downloading xml files for hsaM00037, 1/1 pathways..
## Warning in download.file(xml.url, xml.target, quiet = T): cannot open URL
## 'https://rest.kegg.jp/get/hsaM00037/kgml': HTTP status was '400 Bad Request'
## Warning: Download of hsaM00037 xml file failed!
## This pathway may not exist!
## Warning: Failed to download KEGG xml/png files, hsaM00037 skipped!

setwd(cur_path)
pathway_analysis("D3", pathdat)
## No background compounds specified. Default background will be used.
## Warning in defineCompounds(compounds = compounds, compoundsBackground =
## compoundsBackground, : Some compounds were introduced as affected but they do
## not belong to the background. These compounds will be excluded from the
## analysis. Use 'getExcluded' to see them.
## Running diffusion...
## Estimating p-values by simulation.
## Diffusion matrix not loaded. Simulations will be slower...
## 10%
## 20%
## 30%
## 40%
## 50%
## 60%
## 70%
## 80%
## 90%
## 100%
## Done.
## 672 nodes below the threshold have been limited to 250 nodes.
## 672 nodes below the threshold have been limited to 50 nodes.

## Writing diffusion results...
## Done.

## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa00240.D3_hsa00240.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa01232.D3_hsa01232.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Note: hsa02010 not rendered, 0 or 1 connected nodes!
## Try "kegg.native=T" instead!
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04022.D3_hsa04022.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04080.D3_hsa04080.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04540.D3_hsa04540.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04919.D3_hsa04919.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04923.D3_hsa04923.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa04924.D3_hsa04924.pdf
## 'select()' returned 1:1 mapping between keys and columns
## Info: Working in directory C:/Users/emily/OneDrive/Desktop/academia_sinica/meta_analysis_analysis/dat_file
## Info: Writing image file hsa05032.D3_hsa05032.pdf

setwd(cur_path)


pathdat <- pathdat %>%
  mutate(db_id = ifelse(!is.na(HMDB), paste0("hmdb:", HMDB),  paste0("pubchem:", PubChem))) 
#                       ifelse(!is.na(ChEBI), paste0("chebi:", ChEBI),
#                              ifelse(!is.na(KEGG), paste0("kegg:",KEGG), Samples
#                                     ))))) %>%
pathdat %>%
  select(db_id) %>%
  distinct() %>%
  write_csv(file = "dat_file/data_full_annotation.csv")

ok then i read that file into ramp https://rampdb.nih.gov/pathways-from-analytes and got new pathways labels (because too many metabolites didn’t map to anything in KEGG) new pathway analysis. However, I found these results to not be particularly useful or informative compared to what was already done above, so this is not included in the publication. Partially, KEGG is more standard, so I thought it would be better to include those results. I also made the decision to still include this code here so others can see what we did, what we decided to include, and how results may vary with different bioinformatic methods (in our case, no huge variance).

mop <- read.csv("dat_file/fetchPathwaysFromAnalytes-download-ramp.tsv", sep = "\t")
#mop ## commonName, inputID
#pathdat ## db_iud = inputId

newpath <- merge(mop, pathdat, by.x = "inputId", by.y = "db_id", all.y = T)
newpath <- newpath %>% 
  mutate(studyID = ifelse(studyID == "D1", "D1_new",
                          ifelse(studyID == "D3", "D3_new", studyID))) %>%
  filter(pathwaySource=="kegg")

#Helper function for string wrapping. 
# Default 20 character target width.
swr = function(string, nwrap=20) {
  paste(strwrap(string, width=nwrap), collapse="\n")
}
swr = Vectorize(swr)
newpath$pathwayName <- swr(newpath$pathwayName)
pathways <- unique(newpath$pathwayName)

newpath$pathwayName %in%  pathways
##    [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##   [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##   [29] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##   [43] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##   [57] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##   [71] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##   [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##   [99] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [113] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [127] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [141] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [155] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [169] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [183] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [197] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [211] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [225] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [239] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [253] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [267] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [281] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [295] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [309] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [323] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [337] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [351] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [365] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [379] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [393] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [407] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [421] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [435] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [449] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [463] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [477] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [491] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [505] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [519] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [533] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [547] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [561] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [575] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [589] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [603] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [617] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [631] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [645] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [659] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [673] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [687] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [701] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [715] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [729] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [743] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [757] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [771] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [785] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [799] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [813] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [827] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [841] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [855] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [869] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [883] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [897] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [911] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [925] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [939] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [953] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [967] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [981] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [995] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1009] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1023] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1037] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1051] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1065] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1079] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1093] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1107] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1135] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1149] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1163] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1177] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1191] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1205] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1219] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1233] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1247] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1261] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1275] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1289] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1303] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1317] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1331] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1345] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1359] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1373] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1387] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1401] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1415] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1429] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1443] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1457] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1471] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1485] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1499] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1513] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1527] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1541] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1555] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1569] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1583] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1597] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1611] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1625] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1639] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1653] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1667] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1681] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1695] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1709] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1723] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1737] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1751] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1765] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1779] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1793] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1807] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1821] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1835] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1849] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1863] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1877] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1891] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1905] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1919] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1933] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1947] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1961] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1975] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1989] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2003] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2017] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2031] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2045] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2059] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2073] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2087] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2101] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2115] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2129] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2143] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2157] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2171] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2185] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2199] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2213] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2227] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2241] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2255] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2269] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2283] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2297] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2311] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2325] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2339] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2353] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2367] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2381] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2395] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2409] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2423] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2437] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2451] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2465] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2479] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2493] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2507] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2521] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2535] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2549] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2563] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2577] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2591] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2605] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2619] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2633] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2647] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2661] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2675] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2689] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2703] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2717] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2731] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2745] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2759] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2773] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2787] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2801] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2815] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2829] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2843] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2857] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2871] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2885] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2899] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2913] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2927] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2941] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [2955] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#######################
for (p in pathways){
    a <- newpath %>%
      filter(pathwayName == p) %>%
      ggplot(aes(x = Samples, y = abun, fill = study_group)) +
      geom_boxplot(outlier.shape = NA, alpha = 0.5) +
      geom_point(position = position_jitterdodge(), alpha = 0.7)+
      theme_bw() +
      facet_wrap(.~pathwayName, scales = "free", labeller = labeller(grp = label_wrap_gen(25))) +
      labs(title = paste0("Significant Metabolites")) +
      theme_set(theme_bw(base_size=20))+
      scale_fill_manual(values = c("D1_CTRL" = "grey30",
                                  "D3_CTRL" = "grey20",
                                  "D1_HF" = "goldenrod3",
                                  "D3_HF" = "goldenrod1")) +
      #geom_text(data = textdat, aes(x = 1.5, y = 0, label = paste0("adj. p ", adj.p)))
      labs(subtitle = paste0("adj. p ", textdat$adj.p), size = 5, 
           x = " ") +
      theme(legend.position = "top", legend.title = element_blank()) +
      theme(axis.text.x = element_text(angle = 45, hjust=1))
    
      print(a)
      fname = paste0("univariate_sig_plots_ramp_pathways/kegg/",p, ".png" )
      ggsave(plot=a, filename=fname, dpi = 300, height = 7, width = 5)
}
## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

## Warning in grDevices::dev.off(): agg could not write to the given file

And that’s a wrap! We need to give a big thank you to the authorship team. Everyone worked very hard to review the papers screened for this meta-analysis and provided valuable input as we were looking at preliminary results. I am especially grateful to Dr. Patrick Hsieh, my postdoc mentor and PI of this project, who provided input, support, and discussion through this whole project.